TextPipe: Online Help
    Unicode Pattern Reference
 

Submit feedback on this topic 

 Home  User Assistance   Tutorials   How to Use TextPipe
 Menus: File   Edit   Filters[ Convert   Add   Remove   Unicode   Replace   Special   Map   Email   Restrict ]  Tools   Window   Help   Advanced
Home
Up

 

 

Regular expression search engine for text in UCS2 form taking surrogates into account. This implementation is an improved translation from the URE package written by Mark Leisher who used a variation of the RE->DFA algorithm done by Mark Hopkins.

Assumptions

  1. Regular expression and text already normalized
  2. Conversion to lower case assumes a 1-1 mapping.

Definitions

Separator - any one of U+2028, U+2029, NL, CR.

Operators

  • . - match any character
  • * - match zero or more of the last sub expression
  • + - match one or more of the last sub expression
  • ? - match zero or one of the last sub expression
  • () - sub expression grouping
  • {m, n} - match at least m occurrences and up to n occurrences
    Note: both values can be 0 or omitted which denotes then a unlimited bound {,} and {0,} and {0, 0} correspond to * {, 1} and {0, 1} correspond to ? {1,} and {1, 0} correspond to +
  • {m} - match exactly m occurrences

Notes

The "." operator normally does not match separators, but a flag is available that will allow this operator to match a separator.

Literals and Constants

  • c - literal UCS2 character
  • \x.... - hexadecimal number of up to 4 digits
  • \X.... - hexadecimal number of up to 4 digits
  • \u.... - hexadecimal number of up to 4 digits
  • \U.... - hexadecimal number of up to 4 digits

Character classes

  • [...] - Character class
  • [^...] - Negated character class
  • \pN1,N2,...,Nn - Character properties class
  • \PN1,N2,...,Nn - Negated character properties class

POSIX character classes recognized: :alnum: :alpha: :cntrl: :digit: :graph: :lower: :print: :punct: :space: :upper: :xdigit:

Notes

  • Character property classes are \p or \P followed by a comma separated list of integers between 0 and the maximum entry index in TCharacterCategory. These integers directly correspond to the TCharacterCategory enumeration entries. Note: upper, lower and title case classes need to have case sensitive search be enabled to match correctly!
  • Character classes can contain literals, constants and character property classes. Example:

    [abc\U10A\p0,13,4]

 

 Contact Us   Support   Community   Tutorials and User Guides (online)
 © 1999-2005 Crystal Software. All rights reserved.